To know how closely related are a set of taxa using phylogenetics, we need to first consider the rapid increment in probable topologies. For instance, 50 taxa translate into 2.752921e+76 rooted trees or 2.838063e+74 unrooted trees. Estimating the true topology from all the possible configurations of taxa in phylogenies is computationally challenging and thus requires optimization of search strategies.
First we will run the formula estimating total numbers of unrooted versus rooted trees for n number of taxa:
#Step 1: Turn formula to estimate the number of unrooted/rooted trees into functions
funrooted<-function(n)
{
factorial(2*n-5)/((2^(n-3))*factorial(n-3))
}
frooted<-function(n)
{
factorial(2*n-3)/((2^(n-2))*factorial(n-2))
}
#frooted(50) #try out a number
#funrooted(50)
Now we can move on to visualizing the outputs into a plot by means of a simulated mini data set.
If we simulate a sequence of numbers and plot using the functions above: the rapid increase in the numbers of trees with the number of taxa looks weird. What happens is that there is wide variation in the values for y that the majority of points then collapse near zero, leaving one big gap between the first and second y axis tick mark. An animated figure showing the process that leads to this is then easier to interpret. Here is how I have approached this issue using the R package gganimate:
#load packages
library("reshape2")
library("ggplot2")
library("gganimate")
library("magick")
## Linking to ImageMagick 6.9.11.57
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11
#Simulate data
x <- seq(3,50,1)
#create simple data frame
df<-data.frame(taxa=x, unrooted_trees=funrooted(x),rooted_trees=frooted(x))
#turn wide df into a long format for plotting purposes
df_long <- melt(df, id="taxa")
#Establish scientific notation
options(scipen=10)
#ggplot
p <- ggplot(df_long, aes(x=taxa, y=value,color=variable))+
geom_line( color="gray", lty="dashed") +
labs(y = "Number of trees", x = "Number of taxa") +
geom_text(aes(label=value),hjust=0, vjust=0)+
scale_color_manual(values = c("unrooted_trees" = "#2A00B6", "rooted_trees" = "#E94657",name = "")) +
geom_point(aes(group = seq_along(variable)), size = 3, alpha = 0.7,shape=21)+
theme_bw() +
theme(legend.title=element_blank(),plot.title = element_text(size = 24, family = "Avenir"),
text = element_text(size = 24,family = "Avenir"))
##Animated ggPlot
p + transition_reveal(along = taxa) + view_follow(fixed_x = TRUE)
#save
#anim<-animate(p2,height = 800, width =800)
#magick::image_write(anim, path="trees.gif")
Voilà! Note that the values of y, or numbers of possible tree topologies dynamically appear as do the data points and the y axis changes depending on the data!
#Citations
citation("reshape2")
citation("ggplot2")
citation("gganimate")
citation("magick")